RADAGRAD: Random Projections for Adaptive Stochastic Optimization
نویسندگان
چکیده
We present RADAGRAD a simple and computationally efficient approximation to full-matrix ADAGRAD based on dimensionality reduction using the subsampled randomized Hadamard transform. RADAGRAD is able to capture correlations in the gradients and achieves a similar regret – in theory and empirically – to fullmatrix ADAGRAD but at a computational cost comparable to the diagonal variant.
منابع مشابه
Scalable Adaptive Stochastic Optimization Using Random Projections
Adaptive stochastic gradient methods such as ADAGRAD have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of ADAGRAD is expected to attain ...
متن کاملConsidering Stochastic and Combinatorial Optimization
Here, issues connected with characteristic stochastic practices are considered. In the first part, the plausibility of covering the arrangements of an improvement issue on subjective subgraphs is studied. The impulse for this strategy is a state where an advancement issue must be settled as often as possible for discretionary illustrations. Then, a preprocessing stage is considered that would q...
متن کاملMarket Adaptive Control Function Optimization in Continuous Cover Forest Management
Economically optimal management of a continuous cover forest is considered here. Initially, there is a large number of trees of different sizes and the forest may contain several species. We want to optimize the harvest decisions over time, using continuous cover forestry, which is denoted by CCF. We maximize our objective function, the expected present value, with consideration of stochastic p...
متن کاملAn Efficient Adaptive Optimization Scheme
Adaptive optimization schemes based on stochastic approximation principles such as the Random Directions Kiefer-Wolfowitz (RDKW), the Simultaneous Perturbation Stochastic Approximation (SPSA) and the Adaptive Fine-Tuning (AFT) algorithms possess the serious disadvantage of not guaranteeing efficient transient behaviour due to their requirement for using random or random-like perturbations of th...
متن کاملMedium Term Hydroelectric Production Planning - A Multistage Stochastic Optimization Model
Multistage stochastic programming is a key technology for making decisions over time in an uncertain environment. One of the promising areas in which this technology is implementable, is medium term planning of electricity production and trading where decision makers are typically faced with uncertain parameters (such as future demands and market prices) that can be described by stochastic proc...
متن کامل